Week 4: Visualizing Time Series and Trends

Emorie D Beck

Time Series

What is a time series?

“a series of values of a quantity obtained at successive times, often with equal intervals between them”

What is a time series?

# A tibble: 100 × 2
       t value
   <int> <dbl>
 1     1     5
 2     2     5
 3     3     3
 4     4     4
 5     5     4
 6     6     4
 7     7     4
 8     8     3
 9     9     3
10    10     2
11    11     3
12    12     4
13    13     4
14    14     3
15    15     3
# … with 85 more rows

Who should care about time series?

People who:

  • study longitudinal change (e.g., development)

  • study variability (e.g., experience sampling, passive sensing)

  • run experiments with multiple trials

  • study cohort or age differences

  • simulations (e.g., trace plots in bayesian models)

  • Time is everywhere, and ignoring it can be problematic

What will we cover with time series:

  • Univariate time series
  • Multivariate time series
  • Connected scatter plots
  • Smoothing
  • Detrended time series

This isn’t the first time we’ve seen time series, but today we’ll focus on telling stories with time series

Univariate and Multivariate Time Series

Why visualize a time series if you don’t care about the trend?

  • This is another way to describe your data that can make sure that you see if something went wrong

Univariate and Multivariate Time Series

  • How you visualize the trends you are trying to uncover in a time series will depend on the research question you are asking
    • e.g., very basic time series visualizations are great for descriptives
    • But to include it in a presentation / papers, we usually want to add more affordances that clarify nothing went wrong
    • Affordances include, text, shading, and more, in aligment with Gestalt principles and how we process different aspects of visualizations

But First, Our Data

  • These are some Experience Sampling Method data I collected during my time in graduate school Beck & Jackson (2022)
  • In that paper I built personalized machine learning models of behaviors and experiences from sets of:
    • psychological
    • situational
    • and time variables
  • We also saw these in Week 2
load(url("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/05-week5-time-series/01-data/ipcs_data.RData"))
ipcs_data %>% 
  print(n = 6)
# A tibble: 4,222 × 70
  SID   Full_D…¹ afraid angry atten…² content excited goaldir guilty happy proud
  <chr> <chr>     <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl>
1 02    2018-10…      1     2       4       4       2       5      2     3     4
2 02    2018-10…      1     1       4       3       2       5      1     3     3
3 02    2018-10…      2     1       2       3       1       2      2     3     2
4 02    2018-10…      2     2       4       3       2       4      1     3     3
5 02    2018-10…      2     1       4       4       3       4      1     3     3
6 02    2018-10…      2     1       4       4       2       4      1     3     3
# … with 4,216 more rows, 59 more variables: purposeful <dbl>,
#   agreeableness_Compassion <dbl>, agreeableness_Respectfulness <dbl>,
#   agreeableness_Trust <dbl>, conscientiousness_Organization <dbl>,
#   conscientiousness_Productiveness <dbl>,
#   conscientiousness_Responsibility <dbl>, extraversion_Assertiveness <dbl>,
#   extraversion_Energy.Level <dbl>, extraversion_Sociability <dbl>,
#   neuroticism_Anxiety <dbl>, neuroticism_Depression <dbl>, …

But First, Our Data

Let’s simplify a bit and say we care about 4 different states for two people:

ipcs_data %>%
  filter(SID == c("216")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)
# A tibble: 108 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 216   2019-12-09 12:08     1       3       4       4      2
 2 216   2019-12-09 16:06     2       3       4       4      1
 3 216   2019-12-09 20:14     3       3       3       3      1
 4 216   2019-12-10 12:02     4       3       4       3      2
 5 216   2019-12-10 16:08     5       2       4       3      2
 6 216   2019-12-10 20:05     6       3       3       4      2
 7 216   2019-12-10 8:01      7       2       4       3      2
 8 216   2019-12-11 12:29     8       3       3       3      2
 9 216   2019-12-11 16:05     9       2       3       3      2
10 216   2019-12-11 20:10    10       3       4       3      1
# … with 98 more rows
ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  print(n = 10)
# A tibble: 48 × 7
   SID   Full_Date         beep excited goaldir content guilty
   <chr> <chr>            <int>   <dbl>   <dbl>   <dbl>  <dbl>
 1 02    2018-10-22 13:23     1       2       5       4      2
 2 02    2018-10-22 17:23     2       2       5       3      1
 3 02    2018-10-23 10:00     3       1       2       3      2
 4 02    2018-10-23 13:24     4       2       4       3      1
 5 02    2018-10-23 17:53     5       3       4       4      1
 6 02    2018-10-24 10:00     6       2       4       4      1
 7 02    2018-10-24 13:23     7       2       4       3      1
 8 02    2018-10-24 17:29     8       4       4       4      1
 9 02    2018-10-24 21:23     9       3       3       3      2
10 02    2018-10-25 13:29    10       3       3       4      2
# … with 38 more rows

Univariate Time Series

It’s hard to make much sense of this because we end up trying to draw the line connecting the points with our eyes:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = excited)) + 
    geom_point() + 
    theme_classic()

Univariate Time Series

It’s easier to make much sense of this because we can just follow the line:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = excited)) + 
    geom_line() + 
    geom_point() + 
    theme_classic()

Univariate Time Series

But often in time series, we won’t want / need to plot the points:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = excited)) + 
    geom_line() + 
    theme_classic()

Take a moment and clean up this figure.

Univariate Time Series

Take a moment and clean up this figure.

One way to highlight increases is using trend lines:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(color = "grey", size = .9) + 
    geom_smooth(method = "lm", formula = y ~ poly(x,2)) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Ccontentedness increased in the second week"
      , subtitle = "Participant 2"
      , caption = "y ~ x2"
    ) + 
    theme_classic() + 
    theme(
      axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
    )

Univariate Time Series

Another way to highlight changes is to use area:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content-1)) + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Ccontentedness increased in the second week"
      , subtitle = "Participant 2"
    ) + 
    theme_classic() + 
    theme(
      axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
    )

Univariate Time Series

Another way to highlight changes is to use area:

ipcs_data %>%
  filter(SID == c("02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content-1)) + 
    geom_area(fill = "purple4", alpha = .2) + 
    geom_line(color = "purple4", size = .9) + 
    geom_vline(aes(xintercept = 28), linetype = "dashed", size = 1) + 
    annotate("text", label = "Week 1", x = 15, y = 0.1, hjust = .5) + 
    annotate("text", label = "Week 2", x = 40, y = 0.1, hjust = .5) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , title = "Ccontentedness increased in the second week"
      , subtitle = "Participant 2"
    ) + 
    theme_classic() + 
    theme(
      axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
    )

Multivariate Time Series

  • We can also apply the same principles to either:
    • the same variable across participants
    • different variables within the same participant

Multivariate Time Series

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID)) + 
    theme_classic()

The same variable across participants

Oof, this looks a little rough

Multivariate Time Series

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID)) + 
    scale_color_manual(values = c("darkblue", "orange3")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , color = "Participant ID"
    ) + 
    theme_classic() +
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      )

The same variable across participants

Multivariate Time Series

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID), size = .8) + 
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "orange3", x = 18, y = 1.75, hjust = 0) + 
    scale_color_manual(values = c("darkblue", "orange3")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , color = "Participant ID"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      )

The same variable across participants

But what’s the story here?

Multivariate Time Series

ipcs_data %>%
  filter(SID %in% c("05", "02")) %>%
  select(SID, Full_Date, beep, excited
         , goaldir, content, guilty) %>%
  ggplot(aes(x = beep, y = content)) + 
    geom_line(aes(color = SID), size = .8) +
    geom_smooth(aes(color = SID), method = "lm",se = F) +
    annotate("label", label = "Participant 02", color = "white", fill = "darkblue", x = 33, y = 2.5, hjust = 0) + 
    annotate("label", label = "Participant 5", color = "white", fill = "grey60", x = 18, y = 1.75, hjust = 0) + 
    scale_color_manual(values = c("darkblue", "grey60")) + 
    scale_x_continuous(limits = c(1,50), breaks = c(1,seq(5,50,5))) + 
    scale_y_continuous(limits = c(1,5), breaks = seq(1,5,1), labels = 1:5) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Reported Momentary Contentedness (1-5)"
      , color = "Participant ID"
      , title = "Both Participants' Contentedness Increased"
      , subtitle = "But Participant 5 remained more content on average"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "bottom"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.1))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      )

The same variable across participants

Multivariate Time Series

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    geom_line(aes(color = item)) + 
    theme_classic()

Different variables within the same participant

Multivariate Time Series

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    geom_line(aes(color = item), size = .9) + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) + 
    theme_classic() + 
    theme(legend.position = "none")

Different variables within the same participant

Multivariate Time Series

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    annotate("rect", xmin = 13, xmax = 22, ymin = 1.8, ymax = 3.2, fill = "orchid", alpha = .3) + 
    geom_line(aes(color = item), size = .9) + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    theme_classic() + 
    theme(legend.position = "none")

Different variables within the same participant

Multivariate Time Series

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value)) + 
    annotate("rect", xmin = 13, xmax = 22, ymin = 1.8, ymax = 3.2, fill = "orchid", alpha = .3) + 
    geom_line(aes(color = item), size = .9) + 
    geom_point(size = .9) + 
    annotate("text", label = "Goal\nDirected", x = 50, y = 4, hjust = 0) + 
    annotate("text", label = "Guilty", x = 50, y = 2, hjust = 0) + 
    scale_color_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,57), breaks = seq(0,50,5)) +
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      )

Different variables within the same participant

Multivariate Time Series

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, goaldir, guilty) %>%
  pivot_longer(
    cols = c(goaldir, guilty)
    , names_to = "item"
    , values_to = "value"
  ) %>%
  ggplot(aes(x = beep, y = value-1)) + 
    geom_area(aes(fill = item), alpha = .4) + 
    geom_line(color = "orchid4", size = .9) + 
    geom_point(size = .9) + 
    scale_fill_manual(values = c("orchid4", "orchid2")) +
    scale_x_continuous(limits = c(1,50), breaks = seq(0,50,5)) +
    scale_y_continuous(limits = c(0,4), breaks = seq(0,4,1), labels = 1:5) + 
    labs(
      x = "Beep (1-50)"
      , y = "Self-Rated Momentary Value (1-5)"
      , title = "When goal-directedness was high, guilt was low"
      , subtitle = "Guilt was rarely equal to or higher than goal-directedness"
    ) + 
    facet_wrap(~item, nrow = 2) + 
    theme_classic() + 
    theme(
      legend.position = "none"
      , legend.text = element_text(face = "bold", size = rel(1.1))
      , legend.title = element_text(face = "bold", size = rel(1.1))
      , axis.text = element_text(face = "bold", color ="black", size = rel(1.1))
      , axis.title = element_text(face = "bold", size = rel(1.1))
      , plot.title = element_text(face = "bold", hjust = .5, size = rel(1.2))
      , plot.subtitle = element_text(face = "italic", hjust = .5, size = rel(1.1))
      , strip.background = element_rect(fill = "orchid4")
      , strip.text = element_text(face = "bold", size = rel(1.2), color = "white")
      )

Different variables within the same participant

Connected Scatter Plots

  • Connected scatter plots have popped up in our visualizations because they require a lot of visual literacy and can be confusing if not executed incredibly carefully

Connected Scatter Plots

Let’s look at them negative and positive emotion composites using geom_path()

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, afraid:purposeful) %>%
  pivot_longer(
    cols = afraid:purposeful
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(valence = ifelse(item %in% c("afraid", "angry", "guilty"), "Negative", "Positive")) %>%
  group_by(SID, Full_Date, beep, valence) %>%
  summarize(value = mean(value, na.rm = T)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = "valence"
    , values_from = "value"
  ) %>%
  arrange(beep) %>%
  ggplot(aes(x = Negative, y = Positive)) + 
    geom_path(aes(color = beep)) + 
    geom_point() + 
    scale_color_viridis_c() + 
    theme_classic() + 
    theme(legend.position = "bottom")

This isn’t convincing me. Maybe let’s try a different geom

Connected Scatter Plots

Let’s look at them negative and positive emotion composites using geom_segment()

ipcs_data %>%
  filter(SID == "02") %>%
  select(SID, Full_Date, beep, afraid:purposeful) %>%
  pivot_longer(
    cols = afraid:purposeful
    , names_to = "item"
    , values_to = "value"
  ) %>%
  mutate(valence = ifelse(item %in% c("afraid", "angry", "guilty"), "Negative", "Positive")) %>%
  group_by(SID, Full_Date, beep, valence) %>%
  summarize(value = mean(value, na.rm = T)) %>%
  ungroup() %>%
  pivot_wider(
    names_from = "valence"
    , values_from = "value"
  ) %>%
  ggplot(aes(x = Negative, y = Positive, label = beep)) + 
    geom_segment(aes(
                    xend=c(tail(Negative, n=-1), NA)
                    , yend=c(tail(Positive, n=-1), NA)
                    , color = beep
                  )
                 , arrow = arrow(length = unit(0.4, "cm"))
      ) + 
    geom_point() + 
  scale_color_viridis_c() + 
  theme_classic()

This isn’t convincing me either, but what if we have a stronger correlation and fewer points?

Connected Scatter Plots

  • Remember our week 3 data, where we had self-rated health and other variables over years?